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_„ ' In a recent paper, Kuperberg described the first subexponential time algorithm for solv- 

ing the dihedral hidden subgroup problem. The space requirement of his algorithm is super- 
polynomial. We describe a modified algorithm whose running time is still subexponential and 
£>. ' whose space requirement is only polynomial. 
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1 Introduction 

o: 

A central problem in quantum computation is the hidden subgroup problem (HSP). Here, we are 
given a black box that computes a function on elements of a group G. The function is known to 
be constant and distinct on left cosets of a subgroup H C G and our goal is to find H. Interest- 
ingly, most known quantum algorithms that provide a super-polynomial advantage over classical 
algorithms solve special cases of the HSP on Abelian groups. There has also been considerable 
interest in the HSP on noncommutative groups (see, e.g., EH5JQoH3|). For example, one important 
group is the symmetric group: it is known that solving the HSP on the symmetric group leads to 

. ^ ' a solution to graph isomorphism ] 6 1. 

/^ In this paper we will be interested in the HSP on the dihedral group. The dihedral group of 

order 2N, denoted D^, is the group of symmetries of an A-sided regular polygon. It consists 
of N rotations, which we denote by (0, 0), ... , (0, N — 1), and N reflections, which we denote 
by (1,0), . . . , (1, A — 1). It is isomorphic to the abstract group generated by the element p of 
order n and the element r of order 2 subject to the relation pr = rp~ l . Regev |9| showed that 
under certain conditions, an efficient solution to the dihedral HSP implies a quantum algorithm 
for lattice problems. This gives a strong incentive to finding an efficient solution to the dihedral 
HSP. 

However, although the dihedral group is one of the simplest noncommutative groups, no 
efficient solution to the dihedral HSP is known. Ettinger and Hoyer |2| showed that one can 
obtain sufficient statistical information about the hidden subgroup with only a polynomial number 
of queries to the black box. However, there is no efficient algorithm that solves the HSP using this 
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information. In fact, it was shown in 1 8 1 that solving HSP using this information is a hard problem 
in a certain precise sense. 

Recently, Kuperberg \7\ presented the first subexponential time algorithm for the dihedral HSP. 
Namely, his algorithm runs in time 2°^ logAf ) (the input size is 0(log N)). This is currently the best 
known algorithm for the dihedral HSP. However, in order to achieve this running time, Kuper- 
berg's algorithm requires 2°(^ logJV ) space. Essentially, this happens since the algorithm keeps 
many qubits around until certain collisions occur. Our main result in this paper is an algorithm 
that requires only polynomial space, i.e., poly (log N). The running time of our algorithm is still 
subexponential and only slightly higher than Kuperberg's algorithm, namely, 2°(v / i°gA r iogiogA0_ 

Our algorithm combines ideas from Kuperberg's algorithm \7\ and a paper by Regev |9|. Our 
classical abstraction of the problem is influenced by a paper by Blum, Kalai and Wasserman Q. 
We start in Section |2] with a simplified description of Kuperberg's algorithm. Then, in Section |3j 
we describe our new algorithm. 

2 Kuperberg's Algorithm 

In this section we present a simplified description of Kuperberg's algorithm. We concentrate on 
the basic idea and try to omit some of the more technical issues. We start with describing an 
algorithm for a certain classical problem. We will later show that this algorithm corresponds 
exactly to Kuperberg's algorithm. 

2.1 A Classical Abstraction 

For simplicity, we only consider the case where N = 2 n and n = k 2 + 1 for some integer k. The 
algorithm can be modified to work without this assumption. 

Let us consider the following classical scenario. We are dealing with 'objects' that are labelled 
with numbers modulo N = 2 n (eventually, these objects will turn out to be qubits, but let's forget 
about that for now). Our goal is to obtain an object whose label is 2 n_1 . These objects are created 
by a 'machine' that we have at our disposal. This machine outputs both an object and its label. We 
are guaranteed that the machine outputs objects whose label is chosen uniformly at random from 
{0, . . . , 2 n — 1}. Each time we ask the machine for a new object we pay one time unit. So here is 
our first algorithm: call the machine repeatedly until it happens to output an object whose label is 
2 n . Clearly, this algorithm requires 0(2 n ) time units on average. 

It turns out that these objects have a nice property: given two objects, labelled with a and b, we 
can combine them and obtain a new object whose label is a — b (the two original objects are gone). 
This combination operation succeeds with probability 50%; with probability 50%, the operation 
fails and then both original objects are gone. Let us now show how to obtain an algorithm whose 
running time is 2°(v™), This is the basic idea underlying \7\. 

The overall structure of the algorithm is that of a 'pipeline' of k routines, as in Figure ^ That 
is, the input to routine i + 1 is the output of the routine i. The input to routine 1 are 'fresh' 
objects from the machine, i.e., objects whose labels are chosen uniformly at random. For any 
i = l,...,k, the output of routine i (and the input to routine i + 1) are objects whose labels have 
the following distribution: the ik least significant bits equal and the remaining n — ik bits are 



chosen uniformly at random. In other words, each routine is supposed to output objects whose 
labels have k additional bits zeroed out. Notice that with probability 50%, the last routine (i.e., 
routine k) outputs an object whose label is 2 n_1 . 
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Figure 1: A pipeline of routines with n = 10, k = 3. 

It remains to describe how to implement the routines. Let us describe routine i for some i = 
1, . . . , k. The routine maintains a pile of objects. Initially, the pile is empty. Whenever a new 
object arrives, the routine compares the k bits in positions (i — \)k + 1, . . . , ik of its label to the 
same bits in all of the objects currently in its pile (notice that the bits in positions 1, . . . , (i — l)k 
are guaranteed to be zero). If no match is found (i.e., no object currently in the pile has the same 
setting to these k bits) then the routine adds the new object to the pile. If a match is found then 
the routine combines the new object with the matching object in the pile. With probability 50%, the 
combination is successful and the machine outputs the resulting object. Notice that the ik least 
significant bits of the label of the resulting object are all zero. Moreover, the remaining n — ik bits 
are still random since the behavior of the routine does not depend on them. 

Finally, let us show that the expected running time of the algorithm is indeed 2° ( - v ^. In other 
words, we will show that this is the amount of time it takes to obtain one object from the last 
routine in the pipeline. The intuitive idea is the following. Initially, the pile of a routine is empty 
and matches rarely occur. However, after around 2 k objects the pile gets rather full; from that 
point on, the routine needs an average of four objects in order to produce one output object (since 
we are combining two objects in order to produce one output object and our success probability 
is 50%). Hence, the number of objects needed from the machine in order to produce one object by 
the final routine is roughly 4 fc . 

Let us make this argument more formal. First we observe that with very high probability, a 
routine that gets as input I ■ 2 k objects for some I > 8, outputs at least l/8-2 k objects. This follows by 
noting that at most 2 k of these objects can remain in the pile. On the remaining (/ — 1) • 2 k objects, 
the routine performs combination operations. The expected number of output objects is therefore 
(Z — l)/4 • 2 k ; a simple application of the Chernoff bound shows that with very high probability, 
the number of output objects is at least 1/8 ■ 2 k . We can now complete the proof by noting that if 
the first routine is given 8 fe • 2 k = 2 °(v") objects then with very high probability the last routine 
outputs at least one object (and in fact, at least 2 k objects). 

2.2 The Quantum Setting 

We now show how to obtain from the above an algorithm for the dihedral hidden subgroup prob- 
lem. We are given oracle access to a function / : Dn — ► R from the dihedral group to some 
arbitrary set R. The function is promised to be constant on cosets of some subgroup H C Djm 
and distinct on different cosets. Our goal is to extract the subgroup H. Ettinger and Hoyer |2| 
showed that it is enough to solve the problem for the case where H = {(0, 0), (1, d)} is generated 
by a reflection (1, d). Hence, our goal now is to find d, a number between and 2 n — 1. 



In fact, finding the least significant bit of d is enough. Indeed, let us show how to find d given 
an algorithm that only finds the least significant bit of d. We start by calling the algorithm once 
with the given oracle. This allows us to obtain the least significant bit of d. Assume the answer 
is '0'. Then, consider the function /' : D N / 2 — ► R given by f'(a,b) := /(a, 26). Notice that this 
function hides the subgroup {(0, 0), (1, d/2)} of D N / 2 . Similarly, if the answer is '1', consider the 
function /" : D N j 2 — ► R given by f"(a,b) := f(a,2b + 1). This function hides the subgroup 
{(0, 0), (1, (d — l)/2)} of -Djv/2- We can now obtain the second least significant bit of d by calling 
the algorithm with either /' or /". By continuing this process, we can find all the bits of d. 

Hence, in the following we show how to obtain the least significant bit of d. We start with 
a simple quantum routine that produces certain one-qubit states. First, we create the uniform 
superposition over all elements of Djv. Namely, we create the state 

£im> 

b,x 

where b ranges over {0, 1} and x ranges over {0, . . . , TV — 1}. Here and in the following we omit 
the normalizing factor. We now add some qubits and call the oracle. The resulting state is 

J>,z)|/(M)>. 

b.x 

After measuring the last register, the state collapses to 

|0,x) + \l,x + dmodN) 

for some arbitrary x. We perform a standard (Abelian) Fourier transform on the second register 
and obtain 

iV-l N-l 

J2 exp(2nixy/N)\0,y) + ^ exp(27ri(x + d)y/N)\l,y). 

y=0 y=0 

Finally, we measure y and obtain the one-qubit state 

|0)+exp(27ridj//JV)|l). 

Notice that y is distributed uniformly on{0,l,...,7V— 1} and is known to us. 

The above routine can be seen as the 'machine' in the classical scenario described above. 
Namely, an object with label y G {0, . . . ,2 n — 1} is simply the one-qubit state |0) + exp(27ridy/iV)|l). 
Combining two objects is done as follows. Given |0) +ex.p(2mdyi / N) | 1} and |0) +ex.p(2iridy 2 /N) | 1) 
we tensor them together and obtain 

|00) + exp(2iridyi/N)\W) + exp(27udy 2 /iV)|01) + exp(27rid( yi + y 2 )/N)\ll) 

We now measure the parity of the two qubits. With probability 50%, we measure 'odd' and the 
state collapses to 

exp(27rzdyi/iV)|10) + exp(2vr^y 2 /iV)|01). 

By omitting the global phase and renaming the basis states, this is equivalent to 

|0)+exp(27uc% 2 -yi)/iV)|l), 



as required. Hence, we can apply the algorithm described above and obtain, after 2°(v™ ) opera- 
tions, the state 

|0) + exp(2irid2 n - l /N)\l) = |0) + exp(irid)\l) . 

Measuring this state in the Hadamard basis yields the least significant bit of d. 

3 A Polynomial Space Algorithm 

In this section we present our new algorithm. As can be seen from the above description, each 
routine has to store « 2v™ objects (i.e., qubits) before a collision is found. Hence, the space re- 
quirement is 2 °(v"). The space requirement of the algorithm we present in this section is only 
polynomial. The running time is only slightly larger, namely, 2°^ nlogn \ 

For simplicity, assume that n = 1 + kl where k = 0(yn/logn) and / = 0(\Jn\ogn) are both 
integer (we could also take n = 1 + k 2 as before but this would lead to a slightly worse running 
time). Our algorithm is based on a different combination operation. This operation takes as input 
I + 4 labelled objects whose labels are uniformly distributed and with constant probability outputs 
one object that has its I least significant bits zeroed out. This operation is performed as follows. 
Assume our input is 

|0> + exp(27ri • d yj /N)\l), j = 1, ...,/ + 4. 

We tensor together all these qubits and obtain 

J2 eM^i-d.(b,y)/N)\b) 
6e{o,i} ; + 4 

where y denotes (y\ , . . . , yi + 4 ) and (b, y) denotes Yl bj yj . Since we know y\ , . . . , y\ we can compute 
(b, y) mod 2 l in an extra register and obtain 

J2 exp(2iri-d-(b,y)/N)\b)\(b,y)mod2 l ). 
6e{o,i} ; + 4 

We now measure the second register and obtain some value z € {0, . . . , 2 l — 1}. We then compute 
(classically) the number m of bit strings b G {0, 1} 1+4 for which (b, y) mod 2 l = z. This is done in 
a brute-force way and hence takes time 0(2'). If m is less than two or more than, say, 32, then we 
say that the combination operation failed. Otherwise, the state that we have is 

rn 

Y,eM^i-d-(V,y)/N)\V) 

i=i 

where b 1 , . . . , b m G {0, 1} 1+4 are the bit strings that we found. We would like to remain with exactly 
two terms in the above sum. So we perform a projective measurement on the subspace spanned 
by J6 1 ) and \tr) (we can do this since we know the IP's). With constant probability we have the 
state 

exp(27rz • d ■ (?, jft/JV)!?) + exp(2vri ■ d ■ (b 2 , y)/N)\P). 

By omitting the global phase and renaming, we obtain the one-qubit state 

|0> + exp(2vri • d ■ (b 2 - b\y)/N)\l). 



This is exactly the object whose label is (b 2 — tr, y). Since (b 1 ,y) mod 2 l = (b 2 ,y) mod 2 l = z, the I 
least significant bits of this label are all zero. 

It remains to show why the event that m 6 {2,3,..., 32} happens with constant probability 
over the choice of y. Fix some z G {0, . . . , 2 l — 1}. For each b G {0, 1}' +4 \ {0 Z+4 } we define an 
indicator random variable Xg that is 1 if (b, y) mod 2 l = z and otherwise (for convenience we 
ignore the all zero string since (0' +4 , y) is always zero). Each random variable has expected value 
2~ l and variance 2~ l — 2~ 21 . These random variables are pairwise independent. Let Y = J2 Xr over 
allb€ {0, 1} 1+4 \{0 1+4 }. Its expected value is 16-2"' « 16. Its variance is 16- 17-2"' +2~ 21 « 16. By 
Chebyshev's inequality we obtain that Y € {2, 3, ... , 32} with some constant probability. Hence, 
the expected fraction of z's that have this number of b's mapped to them is constant. Hence, the 
above procedure is successful with constant probability. 

We note that a similar combination operation can be performed on other /-bit blocks. For 
example, given I + 4 objects whose labels have their I least significant bits all zero and the next I 
bits (from location I + 1 to 21) are uniformly distributed, we can extract with constant probability 
one object such that its label has its 21 least significant bits zero. 

Using this combination operation, we can now describe our new algorithm. As before, the 
algorithm operates as a pipeline of k routines. The output of routine i consists of objects whose 
labels have their il least significant bits zeroed out. Unlike the previous algorithm, there is no need 
for a pile. Each routine simply waits until it receives I + 4 objects from the previous routine and 
then it uses the combination operation to obtain one object with I additional bits zeroed out. Recall 
that with constant probability the combination operation is successful. By using the Chernoff 
bound, one can show that if we input l°( k ) = 2°(v /nl °g n ) objects to the pipeline then with very 
high probability, the last routine outputs at least one object. Each combination operation takes 

2 0(l) = 2 0(V« log n) time and hence the tQtal rurming time is a ] so 2 0(Vnlogn)_ 
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